---
title: LLM Context
sidebar_label: "LLM Context"
sidebar_position: 600
description: "Customizing the context provided to the Agent's LLM"
---

By default, the Agent will provide context based on the message history of the
thread. This context is used to generate the next message.

The context can include recent messages, as well as messages found via text and
/or vector search.

If a `promptMessageId` is provided, the context will include that message, as
well as any other messages on that same `order`. More details on order are in
[messages.mdx](./messages.mdx#message-ordering), but in practice this means that
if you pass the ID of the user-submitted message as the `promptMessageId` and
there had already been some assistant and/or tool responses, those will be
included in the context, allowing the LLM to continue the conversation.

You can also use [RAG](./rag.mdx) to add extra context to your prompt.

## Customizing the context

You can customize the context provided to the agent when generating messages
with custom `contextOptions`. These can be set as defaults on the `Agent`, or
provided at the call-site for `generateText` or others.

```ts
const result = await agent.generateText(
  ctx,
  { threadId },
  { prompt },
  {
    // Values shown are the defaults.
    contextOptions: {
      // Whether to exclude tool messages in the context.
      excludeToolMessages: true,
      // How many recent messages to include. These are added after the search
      // messages, and do not count against the search limit.
      recentMessages: 100,
      // Options for searching messages via text and/or vector search.
      searchOptions: {
        limit: 10, // The maximum number of messages to fetch.
        textSearch: false, // Whether to use text search to find messages.
        vectorSearch: false, // Whether to use vector search to find messages.
        // Note, this is after the limit is applied.
        // E.g. this will quadruple the number of messages fetched.
        // (two before, and one after each message found in the search)
        messageRange: { before: 2, after: 1 },
      },
      // Whether to search across other threads for relevant messages.
      // By default, only the current thread is searched.
      searchOtherThreads: false,
    },
  },
);
```

## Full context control

To have full control over which messages are passed to the LLM, you can either:

1. Provide a `contextHandler` to filter, modify, or enrich the context messages.
2. Provide all messages manually via the `messages` argument and specify
   `contextOptions` to use no recent or search messages. See below for how to
   fetch context messages manually.

### Providing a contextHandler

The Agent will combine messages from search, recent, input messages, and all
messages on the same `order` as the `promptMessageId` if that is provided.

You can customize how they are combined, as well as add or remove messages by
providing a `contextHandler` which returns the `ModelMessage[]` which will be
passed to the LLM.

You can specify a `contextHandler` in the Agent constructor, or at the call-site
for a single generation, which overrides any Agent default.

```ts
const myAgent = new Agent(components.agent, {
  ///...
  contextHandler: async (ctx, args) => {
    // This is the default behavior.
    return [
      ...args.search,
      ...args.recent,
      ...args.inputMessages,
      ...args.inputPrompt,
      ...args.existingResponses,
    ];
    // Equivalent to:
    return args.allMessages;
  },
);
```

With this callback, you can:

1. Filter out messages you don't want to include.
1. Add memories or other context.
1. Add sample messages to guide the LLM on how it should respond.
1. Inject extra context based on the user or thread.
1. Copy in messages from other threads.
1. Summarize messages.

For example:

```ts
// Note: when you specify it at the call-site, you can also leverage variables
// available in the scope, e.g. if the user is in a specific step in a workflow.
const result = await agent.generateText(
  ctx,
  { threadId },
  { prompt },
  {
    contextHandler: async (ctx, args) => {
      // Filter out messages that are not relevant.
      const relevantSearch = args.search.filter((m) => messageIsRelevant(m));
      // Fetch user memories to include in every prompt.
      const userMemories = await getUserMemories(ctx, args.userId);
      // Fetch sample messages to instruct the LLM on how to respond.
      const sampleMessages = [
        { role: "user", content: "Generate a function that adds two numbers" },
        { role: "assistant", content: "function add(a, b) { return a + b; }" },
      ];
      // Fetch user context to include in every prompt.
      const userContext = await getUserContext(ctx, args.userId, args.threadId);
      // Fetch messages from a related / parent thread.
      const related = await getRelatedThreadMessages(ctx, args.threadId);
      return [
        // Summarize or truncate context messages if they are too long.
        ...(await summarizeOrTruncateIfTooLong(related)),
        ...relevantSearch,
        ...userMemories,
        ...sampleMessages,
        ...userContext,
        ...args.recent,
        ...args.inputMessages,
        ...args.inputPrompt,
        ...args.existingResponses,
      ];
    },
  },
);
```

### Fetch context manually

If you want to get context messages for a given prompt, without calling the LLM,
you can use `fetchContextWithPrompt`. This is used internally to get the context
messages passed to the AI SDK `generateText`, `streamText`, etc.

As with normal generation, you can provide a `prompt` or `messages`, and/or a
`promptMessageId` to fetch the context messages using a given pre-saved message
as the prompt.

This will return recent and search messages combined with the input messages.

```ts
import { fetchContextWithPrompt } from "@convex-dev/agent";

const { messages } = await fetchContextWithPrompt(ctx, components.agent, {
  prompt,
  messages,
  promptMessageId,
  userId,
  threadId,
  contextOptions,
});
```

## Search for messages

This is what the agent does automatically, but it can be useful to do manually,
e.g. to find custom context to include.

For text and vector search, you can provide a `targetMessageId` and/or
`searchText`. It will embed the text for vector search. If `searchText` is not
provided, it will use the target message's text.

If `targetMessageId` is provided, it will only fetch search messages previous to
that message, and recent messages up to and including that message's "order".
This enables re-generating a response for an earlier message.

```ts
import type { MessageDoc } from "@convex-dev/agent";

const messages: MessageDoc[] = await agent.fetchContextMessages(ctx, {
  threadId,
  searchText: prompt, // Optional unless you want text/vector search.
  targetMessageId: promptMessageId, // Optionally target the search.
  userId, // Optional, unless `searchOtherThreads` is true.
  contextOptions, // Optional, defaults are used if not provided.
});
```

Note: you can also search for messages without an agent. The main difference is
that in order to do vector search, you need to create the embeddings yourself,
and it will not run your usage handler.

```ts
import { fetchRecentAndSearchMessages } from "@convex-dev/agent";

const { recentMessages, searchMessages } = await fetchRecentAndSearchMessages(
  ctx,
  components.agent,
  {
    threadId,
    searchText: prompt, // Optional unless you want text/vector search.
    targetMessageId: promptMessageId, // Optionally target the search.
    contextOptions, // Optional, defaults are used if not provided.
    getEmbedding: async (text) => {
      const embedding = await textEmbeddingModel.embed(text);
      return { embedding, textEmbeddingModel };
    },
  },
);
```

## Searching other threads

If you set `searchOtherThreads` to `true`, the agent will search across all
threads belonging to the provided `userId`. This can be useful to have multiple
conversations that the Agent can reference.

The search will use a hybrid of text and vector search.

## Passing in messages as context

You can pass in messages as context to the Agent's LLM, for instance to
implement [Retrieval-Augmented Generation](./rag.mdx). The final messages sent
to the LLM will be:

1. The system prompt, if one is provided or the agent has `instructions`
2. The messages found via contextOptions
3. The `messages` argument passed into `generateText` or other function calls.
4. If a `prompt` argument was provided, a final
   `{ role: "user", content: prompt }` message.

This allows you to pass in messages that are not part of the thread history and
will not be saved automatically, but that the LLM will receive as context.

## Manage embeddings manually

The `textEmbeddingModel` argument to the Agent constructor allows you to specify
a text embedding model to use for vector search.

If you set this, the agent will automatically generate embeddings for messages
and use them for vector search.

When you change models or decide to start or stop using embeddings for vector
search, you can manage the embeddings manually.

Generate embeddings for a set of messages. Optionally pass `config` with a usage
handler, which can be a globally shared `Config`.

```ts
import { embedMessages } from "@convex-dev/agent";

const embeddings = await embedMessages(
  ctx,
  { userId, threadId, textEmbeddingModel, ...config },
  [{ role: "user", content: "What is love?" }],
);
```

Generate and save embeddings for existing messages.

```ts
const embeddings = await supportAgent.generateAndSaveEmbeddings(ctx, {
  messageIds,
});
```

Get and update embeddings, e.g. for a migration to a new model.

```ts
const messages = await ctx.runQuery(components.agent.vector.index.paginate, {
  vectorDimension: 1536,
  targetModel: "gpt-4o-mini",
  cursor: null,
  limit: 10,
});
```

Updating the embedding by ID.

```ts
const messages = await ctx.runQuery(components.agent.vector.index.updateBatch, {
  vectors: [{ model: "gpt-4o-mini", vector: embedding, id: msg.embeddingId }],
});
```

Note: If the dimension changes, you need to delete the old and insert the new.

Delete embeddings

```ts
await ctx.runMutation(components.agent.vector.index.deleteBatch, {
  ids: [embeddingId1, embeddingId2],
});
```

Insert embeddings

```ts
const ids = await ctx.runMutation(components.agent.vector.index.insertBatch, {
  vectorDimension: 1536,
  vectors: [
    {
      model: "gpt-4o-mini",
      table: "messages",
      userId: "123",
      threadId: "123",
      vector: embedding,
      // Optional, if you want to update the message with the embeddingId
      messageId: messageId,
    },
  ],
});
```
